Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send/receive layers to reduce buffer transfer time #49

Merged
merged 3 commits into from
Sep 9, 2024

Conversation

DavidHuber-NOAA
Copy link
Collaborator

This reduces the amount of data sent/received when collecting data to be written to the output interpolated increment netCDF file.

@RussTreadon-NOAA
Copy link
Contributor

Tagging @CatherineThomas-NOAA for awareness. These changes should not alter output from the analcalc job but I have not run any tests to confirm this is true.

Copy link
Contributor

@RussTreadon-NOAA RussTreadon-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes seem OK.

Ran 20211221 06Z gdasanalcalc from g-w CI C96C48_hybatmDA twice. The first run used gsi_utils at 9382fd0. The second run using gsi_utils from DavidHuber-NOAA:fix/send_recv.

The analysis files are bitwise identical between the two runs.

interp_inc.x from this PR ran a bit slower than the original interp_inc.x

original interp_inc.x

The total amount of wall time                        = 0.614682
The total amount of wall time                        = 0.613466
The total amount of wall time                        = 0.612958

updated interp_inc.x

The total amount of wall time                        = 1.035355
The total amount of wall time                        = 1.906664
The total amount of wall time                        = 1.134823

I'm not sure if these timings are significant. The analysis resolution is very low. The tests were run on Hercules using the /work/noaa/stmp fileset. Use of this fileset is known to produce wall time variability.

@RussTreadon-NOAA
Copy link
Contributor

@CatherineThomas-NOAA do you have a COMROOT from a GFS v17 experiment run at operational resolution (C768 deterministic, C384 ensemble). If not this, do we have cold start initial conditions for GFS v17 at this resolution?

I ask because I would like to see how the revised interp_inc.x in this PR performs at operational resolution.

@DavidHuber-NOAA
Copy link
Collaborator Author

@RussTreadon-NOAA I have some ICs available in /lfs/h2/emc/nems/noscrub/david.huber/save/global_ICs/768/2021122018. Thank you for performing the tests!

@DavidHuber-NOAA
Copy link
Collaborator Author

I also have recent experiments on Hera:

COM: /scratch1/NCEPDEV/global/David.Huber/para/COMROOT/C768_2
EXP: /scratch1/NCEPDEV/global/David.Huber/para/EXPDIR/C768_2

@RussTreadon-NOAA
Copy link
Contributor

Cactus test
Use @DavidHuber-NOAA 's initial conditions to cold start operational resolution parallel using g-w develop at 7a724e0. Parallel only includes cold start half cycle from 20211220 18Z through full cycle 20211221 00Z.

Run gdas jobs twice for 20211221 00Z. First run used gsi_utils.fd @ 9382fd0. The second run swapped in DavidHuber-NOAA:fix/send_recv @ dd449af for gsi_utils.fd.

gdasanalcalc successfully ran to completion in both runs. gdasanalcalc output is bitwise identical from both runs. Differences noted in wall times

original gsi_utils.fd @ 9382fd0

The total amount of wall time                        = 34.778691
The total amount of wall time                        = 31.292571
The total amount of wall time                        = 30.540251
The total amount of wall time                        = 93.992788
The total amount of wall time                        = 20.559132
The total amount of wall time                        = 20.539392
The total amount of wall time                        = 20.439000
The total amount of wall time                        = 12.873213

End analcalc.sh at 10:49:57 with error code 0 (time elapsed: 00:04:42)

DavidHuber-NOAA:fix/send_recv @ dd449af

The total amount of wall time                        = 48.116189
The total amount of wall time                        = 52.347500
The total amount of wall time                        = 42.632630
The total amount of wall time                        = 101.478578
The total amount of wall time                        = 25.671915
The total amount of wall time                        = 34.067406
The total amount of wall time                        = 28.705626
The total amount of wall time                        = 18.753862

End analcalc.sh at 20:31:32 with error code 0 (time elapsed: 00:06:17)

gdasanalcalc using the changes in this PR took 1 minute, 35 seconds longer to run. This represents a 33.7% increase in total wall time with respect to the original gsi_utils.fd.

The wall times above correspond to the following executables

interp_inc.x for inc.fullres.03
interp_inc.x for inc.fullres.06
interp_inc.x for inc.fullres.09
calc_anal.x for anl.06
calc_anal.x for anl.ensres.03
calc_anal.x for anl.ensres.06
calc_anal.x for anl.ensres.09
gaussian_sfcanl.x

This PR only changes interp_inc.x. Interestingly all wall times from the run using the updated gsi_utils.fd show an increased wall time.

Given this rerun the 20211221 00Z gdas cycle again using gsi_utils.fd built from DavidHuber-NOAA:fix/send_recv. Below are gdasanalcalc executable wall times from this rerun

The total amount of wall time                        = 43.677988
The total amount of wall time                        = 44.119262
The total amount of wall time                        = 43.731211
The total amount of wall time                        = 104.130527
The total amount of wall time                        = 21.378497
The total amount of wall time                        = 24.690298
The total amount of wall time                        = 22.647190
The total amount of wall time                        = 14.372008

End analcalc.sh at 14:31:28 with error code 0 (time elapsed: 00:05:47)

The rerun is faster than the original run but still 65 seconds or 23% slower than the control. The interp_inc.x timings are more consistent in the rerun. The timings remain 10 to 14 seconds longer than the control.

While the above is not conclusive, it suggests that the changes in this PR increase the interp_inc.x wall time. The analcalc job is relatively quick. Still, any increase in operational jobs needs to be documented, explained, and accepted.

Tagging @CatherineThomas-NOAA for awareness.

@DavidHuber-NOAA
Copy link
Collaborator Author

@RussTreadon-NOAA @aerorahul @CatherineThomas-NOAA

I performed a series of tests on WCOSS2 dogwood at C96-C768 resolution, running a total of 5 cycles at each resolution and comparing the runtimes from both develop and the fix/send_recv branches. Below is a chart showing the mean runtimes from the 15 interp_inc.x executions at each resolution and the differences.

image

And here are the compared runtimes for the gdasanalcalc job over the 5 cycles and at each resolution.

image

Attached is the spreadsheet where these were calculated.

interp_inc_runtimes.xlsx

So overall, this change appears to result in a slight increase in runtimes, most noticeable at C768, with a mean increase in runtime of 11s (~3.35%) in the job.

@aerorahul
Copy link
Contributor

@DavidHuber-NOAA
Thanks for the thorough testing and sharing the results.
The changes shown here seem acceptable.

Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good.
As documented in the PR:

  • changes are reproducible
  • increase in runtime of the job is under 4%

@CatherineThomas-NOAA
Copy link
Contributor

Thank you @DavidHuber-NOAA and @RussTreadon-NOAA for your rigorous testing on this. The results look reasonable to me.

@aerorahul
Copy link
Contributor

Merging based on approval comments from @CatherineThomas-NOAA

@aerorahul aerorahul merged commit bb0138d into NOAA-EMC:develop Sep 9, 2024
4 checks passed
DavidHuber-NOAA added a commit to DavidHuber-NOAA/GSI-utils that referenced this pull request Sep 13, 2024
* origin/develop:
  Send/receive layers to reduce buffer transfer time (NOAA-EMC#49)
jswhit pushed a commit to jswhit/GSI-utils that referenced this pull request Dec 8, 2024
This reduces the amount of data sent/received when collecting data to be written to the output interpolated increment netCDF file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants